simulation budget
Conservative neural posterior estimation via distributionally robust training
Laplante, William, Hikida, Yuga, Dellaporta, Charita, Briol, François-Xavier, Bharti, Ayush
Simulation-based inference (SBI; Cranmer et al., 2020) is a powerful framework for inferring parameters of scientific models whose likelihood functions are unavailable or computationally prohibitive to evaluate, but for which simulating data is straightforward. The use of flexible neural conditional density estimators has substantially expanded the applicability of SBI to challenging problems, especially in fields such as particle physics (Brehmer, 2021), cognitive neuroscience (Fengler et al., 2021), economics (Dyer et al., 2024) and cosmology (Alsing et al., 2018; Jeffrey et al., 2021). Neural SBI methods rely on simulations from the scientific model to approximate intractable quantities such as the posterior, the likelihood, the likelihood-to-evidence ratio, or the score function; see Zammit-Mangion et al. (2024) for a recent review. In this work, we focus on the widely used neural posterior estimation (NPE) method (Papamakarios and Murray, 2016; Radev et al., 2022). A central practical limitation of NPE is the simulation budget required to train the conditional density estimator. As many scientific simulators are expensive to run, generating a sufficiently large training set is often the main computational bottleneck.
Truncated Marginal Neural Ratio Estimation
Parametric stochastic simulators are ubiquitous in science, often featuring highdimensional input parameters and/or an intractable likelihood. Performing Bayesian parameter inference in this context can be challenging. We present a neural simulation-based inference algorithm which simultaneously offers simulation efficiency and fast empirical posterior testability, which is unique among modern algorithms. Our approach is simulation efficient by simultaneously estimating low-dimensional marginal posteriors instead of the joint posterior and by proposing simulations targeted to an observation of interest via a prior suitably truncated by an indicator function. Furthermore, by estimating a locally amortized posterior our algorithm enables efficient empirical tests of the robustness of the inference results. Since scientists cannot access the ground truth, these tests are necessary for trusting inference in real-world applications. We perform experiments on a marginalized version of the simulation-based inference benchmark and two complex and narrow posteriors, highlighting the simulator efficiency of our algorithm as well as the quality of the estimated marginal posteriors.
Adaptive Gaussian Process Search for Simulation-Based Sample Size Estimation in Clinical Prediction Models: Validation of the pmsims R Package
Olaniran, Oyebayo Ridwan, Shamsutdinova, Diana, Markham, Sarah, Zimmer, Felix, Stahl, Daniel, Forbes, Gordon, Carr, Ewan
Background: Determining an adequate sample size is essential for developing reliable and generalisable clinical prediction models, yet practical guidance on selecting appropriate methods remains limited. Existing analytical and simulation-based approaches often rely on restrictive assumptions and focus on mean-based criteria. We present and validate pmsims, an R package that uses Gaussian process surrogate modelling to provide a flexible and computationally efficient simulation-based framework for sample size determination across diverse prediction settings. Methods: We conducted a comprehensive simulation study with two aims. First, we compared three search engines implemented in pmsims: a Gaussian process-based adaptive method, a deterministic bisection method, and a hybrid approach, across binary, continuous, and survival outcomes. Second, we benchmarked the best-performing pmsims engine against existing analytical (pmsampsize) and simulation-based (samplesizedev) methods, evaluating recommended sample sizes, computational time, and achieved performance on large independent validation datasets. Results: The Gaussian process-based method consistently produced the most stable sample size estimates, particularly in low-signal, high-dimensional settings. In benchmarking, pmsims achieved performance close to prespecified targets across all outcome types, matching simulation-based approaches and outperforming analytical methods in more challenging scenarios. Conclusions: pmsims provides an efficient and flexible framework for principled sample size planning in clinical prediction modelling, requiring fewer model evaluations than non-adaptive simulation approaches.
OneFlowSBI: One Model, Many Queries for Simulation-Based Inference
Nautiyal, Mayank, Ju, Li, Ernfors, Melker, Hagland, Klara, Holma, Ville, Söderholm, Maximilian Werkö, Hellander, Andreas, Singh, Prashant
We introduce \textit{OneFlowSBI}, a unified framework for simulation-based inference that learns a single flow-matching generative model over the joint distribution of parameters and observations. Leveraging a query-aware masking distribution during training, the same model supports multiple inference tasks, including posterior sampling, likelihood estimation, and arbitrary conditional distributions, without task-specific retraining. We evaluate \textit{OneFlowSBI} on ten benchmark inference problems and two high-dimensional real-world inverse problems across multiple simulation budgets. \textit{OneFlowSBI} is shown to deliver competitive performance against state-of-the-art generalized inference solvers and specialized posterior estimators, while enabling efficient sampling with few ODE integration steps and remaining robust under noisy and partially observed data.